Memory Hierarchies in Intelligent Memories : Energy / Performance Design

نویسندگان

  • JOSE RENAU
  • Josep Torrellas
چکیده

Dramatic increase in the number of transistors that can be integrated on a chip, coupled with advances in Merged Logic DRAM (MLD) technology fuels the interest in Processor In Memory (PIM) architectures. A promising use of these architectures is as the intelligent memory system of a workstation or server. In such a system, each memory chip includes many simple processors, each of which is associated to one or more DRAM banks. Such a design extracts high bandwidth from the DRAM. Recently, advances in MLD technology are allowing the on-chip logic transistors to cycle as fast as in logic-only chips, causing a speed mismatch between the high-speed on-chip processors and the slow DRAM banks. Furthermore, the presence of so many processors on chip, all accessing memory, may create thick spikes of energy consumption. In this thesis, I address how to design an efficient memory hierarchy inside an intelligent memory chip. This is a multi-dimensional problem that involves optimizing for performance, energy efficiency and, to a lesser extent, area efficiency. This thesis examines and evaluates simple hardware techniques to eliminate excessive power consumption using real-time corrective support. The results indicate that, to minimize the energy-delay product, each DRAM bank should include a sizable cache of about 8 Kbytes, support segmentation and interleaving, and optionally pipelining. Furthermore, a spectrum of real-time corrective schemes to limit power consumption are evaluated. Of these schemes, gating the clock offers the best tradeoff.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Jenga: Harnessing Heterogeneous Memories through Reconfigurable Cache Hierarchies

Conventional memory systems are organized as a rigid hierarchy, with multiple levels of progressively larger and slower memories. Hierarchy allows a simple, fixed design to benefit a wide range of applications, because working sets settle at the smallest (and fastest) level they fit in. However, rigid hierarchies also cause significant overheads, because each level adds latency and energy even ...

متن کامل

Jenga: Sotware-Defined Cache Hierarchies

Caches are traditionally organized as a rigid hierarchy, with multiple levels of progressively larger and slower memories. Hierarchy allows a simple, fixed design to benefit a wide range of applications, since working sets settle at the smallest (i.e., fastest and most energy-efficient) level they fit in. However, rigid hierarchies also add overheads, because each level adds latency and energy ...

متن کامل

Memory and Storage System Design with Nonvolatile Memory Technologies

The memory and storage system, including processor caches, main memory, and storage, is an important component of various computer systems. The memory hierarchy is becoming a fundamental performance and energy bottleneck, due to the widening gap between the increasing bandwidth and energy demands of modern applications and the limited performance and energy efficiency provided by traditional me...

متن کامل

Energy/Performance Design of Memory Hierarchies for Processor-in-Memory Chips

Merging processors and memory into a single chip has the well-known benefits of allowing high-bandwidth and lowlatency communication between processor and memory, and reducing energy consumption. As a result, many different systems based on what has been called Processor In Memory (PIM) architectures have been proposed [14, 2, 6, 7, 9, 11, 12, 13, 15, 16, 18]. Recent advances in technology [3, ...

متن کامل

Constructing Application-Specific Memory Hierarchies on FPGAs

The high performance potential of an FPGA is not fully exploited if a design suffers a memory bottleneck. Therefore, a memory hierarchy is needed to reuse data in on-chip buffer memories and minimize the number of accesses to off-chip memory. Buffer memories not only hide the external memory latency, but can also be used to remap data and augment the on-chip bandwidth through parallel access of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000